Random Slowness with Western Digital Caviar Green Hard Drives
Thu, 14 Oct 2010 02:28
Several months ago, my MythTV server started acting strangely. At seemingly random intervals, the system would experience high IO, completely crippling the responsiveness of the system to the extent that it would take minutes to even establish an SSH connection to the machine. There appeared to be no way to trigger the problem and running my own high-IO tasks showed no issues. The high-IO periods lasted for at least several minutes but sometimes occurred days apart, and as they wrecked the machine's responsiveness, it was also practically impossible to monitor the system when the symptoms did appear.
Over the months, I tried changing the root file-system, searching for known IO issues with the Linux kernel, XFS or MythTV, checking fragmentation, swap-utilisation, IO-priority settings and running a multitude of different kernels including low-latency versions, all to no avail. The times I managed to SSH into the machine during a high-IO attack, running top and latencytop were equally unrevealing. It was only recently that I finally discovered the culprit: the Western Digital Caviar Green 1.5TB hard drive.
It appears that the drive randomly enters periods where IO radically slows for periods of minutes, then recovers and behaves normally until the next one. There are no other signs that anything else is wrong with the drive such as strange noises, bad sectors or SMART warnings. The drive model in question is a 'WDC WD15EADS-00P8B0' and is only just over a year old which means the symptoms must have started manifesting not long after purchase.
As the problem isn't reproducible on demand, I can't give solid IO performance figures. What I can state is that when the problem has manifested at boot time, I've given up waiting after tens of minutes for a system that usually boots in less that one. I can't imagine that extensive amounts of data are read during boot, which makes me wonder if the problem might be due to seeking rather than actual read speeds. Either way, the result is an unusable system.
I've found postings on these issues here, here and here which indicate the issues affect only specific models. As far as I can tell, Western Digital don't seem to have acknowledged any issues with this drive, nor provided any sort of firmware update that might fix it. As the symptoms don't match conventional signs of drive-failure (it's unclear if the problem is even physical), I can only imagine how many others are experiencing these issues without realising the cause.
I'm not going to condemn Western Digital drives outright. The 2.5" 'WDC WD3200BEVT-35ZCT1' SATA drive in my laptop has been absolutely fine and is also the first laptop hard-drive I've owned that is so quiet that I actually need to look at the activity LED to judge IO load. However, I find it disappointing that Western Digital seems to have done so little to acknowledge what is clearly an issue affecting a number of people.
Now let's see if I can get Western Digital to give me a RMA code.